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Abstract 

We study the performance of Reed-Solomon (RS) codes for the exact repair problem in distributed 
storage. Our main result is that, in some parameter regimes, Reed-Solomon codes are optimal regener¬ 
ating codes, among MDS codes with linear repair schemes. Moreover, we give a characterization of MDS 
codes with linear repair schemes which holds in any parameter regime, and which can be used to give 
non-trivial repair schemes for RS codes in other settings. 

More precisely, we show that for fc-dimensional RS codes whose evaluation points are a hnite held of 
size n, there are exact repair schemes with bandwidth (n — 1) log((n — l)/(n — k)) bits, and that this is 
optimal for any MDS code with a linear repair scheme. In contrast, the naive (commonly implemented) 
repair algorithm for this RS code has bandwidth A:log(n) bits. When the entire held is used as evaluation 
points, the number of nodes n is much larger than the number of bits per node (which is 0(log(n))), 
and so this result holds only when the degree of sub-packetization is small. However, our method applies 
in any parameter regime, and to illustrate this for high levels of sub-packetization we give an improved 
repair scheme for a specihc (14,10)-RS code used in the Facebook Hadoop Analytics cluster. 


1 Introduction 

This paper studies a polynomial interpolation problem which arises from the use of Reed-Solomon codes in 
distributed storage systems. In such systems, a large file is encoded and distributed over many nodes. When 
a node fails, we would like to be able to set up a replacement node efficiently using information from the 
remaining functional nodes. The problem of recovering the failed node exactly is known as the exact repair 
problem. 

One traditional solution to the exact repair problem has been to use maximum-distance-separating (MDS) 
codes, and in particular Reed-Solomon (RS) codes. The RS solution goes as follows. The original file is broken 
up into k blocks, and each block is viewed as an element of a finite field F. We interpret the file as a degree 
k — 1 polynomial / over F: each block is a coefficient of the polynomial. To distribute the file over the 
nodes, we choose n points ai,..., G f, and send /(a^) to node i. Now, if a node fails, we may recover it 
by looking at the information on any k remaining nodes. This follows because any k evaluations f{cti) of a 
degree k — 1 polynomial determine the entire polynomial, and hence the contents of the failed node. 

This is a non-trivial solution to the exact repair problem, but it’s not ideal. Unfortunately for Reed- 
Solomon codes, k nodes are also necessary in this framework, even if all we want to recover is the single 
failed node. This is wasteful: we have to read k symbols from F when we only want one. 

Recently, a new approach has emerged, using regenerating codes. In this framework, we still use an MDS 
code to encode the file onto n nodes. However, a replacement node may choose to download only part of the 
contents of each surviving node, rather than being required to download the entire node. That is, we break 
up the symbols from F into t sub-symbols in some smaller field B (for example, B = {0,1}). The node is 
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allowed to do some local computation and return one or more sub-symbols, and our goal is to download as 
few sub-symbols as possible. It turns out that one can do significantly better by downloading fewer sub¬ 
symbols from more nodes with than the traditional solution of downloading all of the sub-symbols from each 
of k nodes. The number of sub-symbols downloaded in the worst case is called the (exact) repair bandwidth 
of the code, over B. The exact repair problem and regenerating codes were first introduced in |DGW~*~ . 

and have seen a great deal of work since then. See |DRWSll] for an excellent survey, and the University of 
Texas distributed storage wiki m for more up-to-date information and references. 

Reed-Solomon codes are a bit maligned in the regenerating codes literature. A typical paper on regener¬ 
ating codes—including this one—will mention within the first few paragraphs why the traditional approach 
with Reed-Solomon codes is not a good idea for the exact repair problem. Indeed, by now we know of several 
(non-RS) MDS codes which outperform the traditional RS approach. Nonetheless, RS codes are still used! 
Because of their ubiquity, it is important to understand what can and cannot be done with Reed-Solomon 
codes; just because the traditional RS approach isn’t a good idea, that does not mean that there isn’t some 
better way to use RS codes. This was asked as an open question in |DRWSli] . and is the subject of this 
paper. 

Our contributions. We study the exact repair problem for Reed-Solomon codes, and show that one can 
do much better than the naive scheme. In fact, we show that high (constant) rate Reed-Solomon codes 
with evaluation points in the whole field are optimal among linear schemes. Moreover, to the best of 
our knowledge, they can significantly outperform all existing constructions in the same parameter regime. 
Subsequent work by Ye and Barg [YB16] has used our framework to give constructions of RS codes which 
are optimal in other parameter regimes as well (those with high sub-packetization). 

More precisely, our contributions are as follows. 

(1) Exact repair schemes for high-rate Reed-Solomon codes. We show that the repair bandwidth 
for a rate-(l —e) Reed-Solomon code over a field F and length n = |F| is at most (n— 1), over a subfield 
B of size 1/e, for infinitely many (n,e). In particular, this implies that we can solve the exact repair 
problem over GF{2) for high-rate Reed-Solomon codes with repair bandwidth 0{n) bits. Previous 
constructions of MDS codes with a similar degree of sub-packetization (that is, the number of bits per 
symbol) require bandwidth D(nlog(n)) bits. 

(2) A matching lower bound. It is easy to see that k = {1 — e)n is a lower bound for the repair 
bandwidth for any MDS code in this setting, and thus our scheme is optimal up to constant factors. 
However, we can prove an even stronger lower bound for linear schemes. We show that our scheme in 
(1) is optimal for linear repair schemes for MDS codes, even up to the leading constant. 

(3) A characterization of linear repair schemes, with examples. We give a characterization for 
linear exact repair schemes of MDS codes, in terms of the dual code. For Reed-Solomon codes, (whose 
dual is again a generalized Reed-Solomon code), this gives a very natural way to think about con¬ 
structing and analyzing codes. Indeed, it is through this characterization that we prove (I) and (2). 
Moreover, this characterization is useful to construct schemes for arbitrary RS codes. We give two 
further examples of applications to RS codes, beyond (1) and (2). In the first, we give a non-trivial 
construction for a family of RS codes where the length n of the code is small compared to F ; allowing 
for larger field sizes can add flexibility in practice. In our second example, we consider a specific RS 
code, used in the Facebook Analytics Hadoop cluster. This particular code has been analyzed be¬ 
fore [SAP+131ISPDC14] , and using our characterization we are able to find a repair scheme (with the 
help of a computer) that out-performs the best known repair scheme for this code. 

We remark that in subsequent work, [YB16] has used our characterization to find RS codes in this 
parameter regime (where n is small compared to F) which are optimal regenerating codes. 

To the best of our knowledge, ours is the first work to systematically study the repair bandwidth of RS codes 
for general k. In [SPDCI4] . the authors give a framework for studying the repair bandwidth of linear MDS 
codes over finite fields, and as an example they analyze a few specific small RS codes. Surprisingly—and 
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this surprise was the inspiration for our work here—they show that for these small codes, one can do better 
than the naive repair scheme. In this work we give a more general result, for all k, using different techniques. 
We will survey the related work in the next section. 

Organization. In Section[5J we set up notation and survey related work. In Section[31 we state our results 
in more detail, and give an outline of the proofs, which are contained in Sections 14151 and| 6 l We conclude in 
Section [7] with some open questions. 

2 Set-up and Related Work 

In this section we set up our definitions for Reed-Solomon and regenerating codes, and survey related work. 
We note that, in the regenerating codes literature, it is common to use the Greek letters a, /3 ,7 for parameters 
of the code. We prefer to reserve Greek letters for elements of the field F and use Roman letters (like t, h) the 
parameters. For convenience, Table [2] in Appendix El offers a translation between our notation and common 
notation in the regenerating codes literature. 

2.1 Codes and Reed-Solomon Codes 

A code C over a field F of length n is a subset C C F". We will view an element of F" as an F-valued 
function over a domain A of size n. Thus, a code is a collection of functions F from A = {ai,..., a„} into 
F: the code C C F” determined by F and A is 

C = {(/(ai),/(a 2 ),...,/(«„)) : / SF}. 

In this work, we will often abuse notation and write / S C, to mean that the evaluation vector (/(oi),..., /(««)) 
is in C. The number of evaluation points n is called the block length of the code. In this work, we study 
linear codes, i.e., those where F forms an F-vector space, and so C forms a subspace of F". For a linear 
code, the dimension k is the dimension of this subspace, and the rate r is defined as the ratio k/n. We refer 
to a function / £ F as a message, and a corresponding vector (/(ai),..., fiotn)) £ C as a codeword. For 
c £ F", we refer to the components Ci £ F of c as symbols. 

A Reed-Solomon code is the linear code formed when F is a set of low-degree polynomials, and A C F 
is some set of evaluation points. 

Definition 1. The Reed-Solomon code RS{A, k) C F” of dimension k over a finite field F with evaluation 
points A = {ai, 02 , ■ • •, On} Q F is the set 

RS{A, k) = {(/(oi), /(q! 2 ), ■ ■ •, fictn)) ■ f : F ^ F is a polynomial of degree at most k — 1} . 

Reed-Solomon codes are Maximum Distance Separable (MDS) codes, which means that any k symbols 
(that is, evaluations of a polynomial /) can be used to recover the entire codeword (that is, / itself). 

Definition 2. A linear code C, given by F,A, is Maximum Distance Separable (MDS) if the minimum 
distance of the code is the maximum possible, that is, if 

min |{a £ A : f(a) 5 ( 0 )}! = n — k 1. 

In particular, in an MDS code, any k symbols /(o) are enough to determine / and hence the entire 
codeword. Gonversely, k symbols are necessary to determine /: given only k — 1 symbols, a remaining 
symbol /(a*) could be any element of F. 

Our characterization of linear repair schemes will go throught the dual code, C'*~: 

Definition 3. Suppose C is a linear code given by F, A. The dual code of C is 

= I g : F ->■ F : ^ f{a)g{a) = 0 
I a^A 
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The dual of an MDS code is again MDS. 

Remark 1 (Non-standard notation for MDS codes). While it is common to view Reed-Solomon codes as 
sets of functions f : —>■ F, it is not standard to think of general MDS codes this way. In particular, we 

index the positions 1,..., n of the codeword by evaluation points ai,..., even though for general MDS 
codes there may not he a natural choice of such evaluation points. Our main theorems are general and apply 
to all MDS codes, hut the primary motivation of this work is Reed-Solomon Codes. Further, in the context 
of Reed-Solomon Codes, the choice of evaluation points is the crux of our constructions. For these reasons, 
we stick with the Reed-Solomon-inspired notation throughout the paper. 


2.2 Exact Repair Problem and Regenerating Codes 

Recall the exact repair problem from the introduction: a file, consisting of k blocks, is encoded into n nodes. 
The goal is to recover the contents of an erased node by downloading some information from the remaining 
nodes. In the language of MDS codes as above, the file is a function f G F, which can be represented as k 
symbols from a finite field F. Each of the n nodes is associated with an evaluation point a G A, and it stores 
/(a). For an arbitrary a* G A (corresponding to an erased node), the goal is to recover /(a*) given some 
information from /(a) for a G d \ {a*}. Crucially, we may choose to download only part of each symbol 
/(a) G F, meaning that a node may return fewer than log 2 (|E|) bits of information when queried. More 
precisely, each node may return some number of sub-symhols. A sub-symbol is an element of some “base” 
set B which is smaller than F —for example, B = {0,1}. While in principle a node’s response can be an 
arbitrary function of its contents, in this work we focus on linear repair schemes. That is, we assume that 
B < F \a a subfield, we view E as a vector space over B, and we assume that each node a may return any 
R-linear function of its contents f{a). The R-linear transformations from F to B are precisely the trace 
functionals L.y -. F ^ B given by Lj(fi) = tip/Bil'P). Here, trpfB is the field trace of F over B: 

Definition 4. Let F = GF{q*) be a finite field extension of B = GF{q) of degree t. The field trace is defined 

as 

tr_F/_B(/3) =/3-I--f-I --f/3* 

Thus, in a linear repair scheme, the node corresponding to a returns zero or more elements of B of the 
form L.y(f(a)). A linear exact repair scheme can then be described by the field elements 7 that are used in 
each trace functional, along with a (linear) repair algorithm. We give a precise definition below. 

Definition 5 (Linear exact repair scheme). Let C be a linear code over F of length n and dimension k, 
given by a collection of functions T and a set of evaluation points A. A linear exact repair scheme for C 
over a subfield B < F consists of the following. 

• For each a* G A, and for each aGd\{a*}, a set of queries Qaicx*) C F. 

• For each a* G A, a linear reconstruction algorithm that computes 

/(“*) = 

I 

for coefficients Xi G B and a basis vi,...,vt for F over B, so that the coefficients Xi are B-linear 
combinations of the queries 


U {trF/ij(7/(a)) : l&Qa{a*)}. 

Oc^A\oc* 


The repair bandwidth b of the exact 
node a: 


repair scheme is the total number of sub-symbols in B returned by each 


6 = max ^ |(5 a(a*)|- 
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Each symbol from 
F is made up of t 
symbols from B 


k symbols from F 



■» 


An MDS code interprets 
/ € as a function 
f : A ^ F and maps 

Fk pu 

/ (/(ai), ■ • ■,/(««)) 


f^pk 


/(ai)eE 



Figure 1: Setup for a linear exact repair scheme. B < F is a subfield of E, and E is a t-dimensional vector 
space over B. For a Reed-Solomon code, the encoding maps a file / G (/?o, I3i,..., Pk-i) £ E^ to evaluations 
of the polynomial /(X) := The repair bandwidth is the number of symbols from B that the 

replacement node needs to download in order to reconstruct /(a*). 


The repair locality of the exact repair scheme is the number of a which are required to respond: 

d=nfa.x V lQ„(a*)/0- 

a GA ' ^ 

aGA\{a*} 

We will define 

t = loglBl(lT’l) 

to be the dimension of F as a vector space over B. Thus, we can view each symbol from F as a vector oft 
sub-symbols from B. 

We illustrate the setup for Definition [5] in Figure [TJ 


What we care about. There are several parameters of interest in the use of MDS regenerating codes for 
storage. The three that we focus on in this work are the rate k/n, the repair bandwidth b, and the size of 
the base field B. We would like the rate to be as large as possible, ideally approaching 1; this means that 
we minimize storage overhead. We would also like to minimize the number of bits 61og2(|E|) downloaded by 
the replacement node; this means we would like to minimize the communication from the remaining nodes 
to the replacement node in the repair process. 

Before we discuss related work and the use of Reed-Solomon codes for the exact repair problem, we make 
a few remarks about our definitions and goals, and their relationship to the regenerating codes literature. 

Remark 2 (Measuring bandwidth in bits). We focus on the quantity blog 2 {\B\) rather than on just b for the 
following reason. When B = F, it is trivial to obtain b = k, which is clearly optimal. However, this is also 
clearly not a good solution, as it is the same as the traditional RS approach from the introduction. Focusing 
on 51og2(|E|) means that we always measure bits, rather than symbols of some possibly-large subfield B. 
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Remark 3 (Direction of communication). The definition of repair bandwidth above only counts commu¬ 
nication from the remaining nodes to the replacement node. The astute reader will have noticed that the 
replacement node must also communicate to the remaining nodes! Indeed, the remaining nodes must know 
the identity a* of the erased node (or at least know what function of their contents they are supposed to 
return). While this replacement-to-remaining-nodes direction of communication is important, in practice the 
cost of this communication is negligable compared to the remaining-nodes-to-replacement direction that is 
captured in the definition of repair bandwidth. We elaborate more on this point in AvvendixWl For now, we 
just point out the regenerating codes literature focuses on this one-way definition of repair bandwidth, and 
we also adopt this definition in our work. 

Remark 4 (MSR codes). In the regenerating codes literature, the size of the blocks in the file needn’t be 
the same as the storage capacity of the nodes, and there is a beautiful theory investigating the trade-offs 
this involves. Because we are interested in Reed-Solomon codes, which have the same message alphabet and 
codeword alphabet, these sizes are the same. In the regenerating codes terminology, this means we are working 
with minimum storage regenerating (MSR) codes0 In this work we restrict our discussion to this setting. 

Remark 5 (A complicated landscape). There are many figures of merit and variations on the exact repair 
problem. For example, in addition to rate and bandwidth, we may care about locality; we may care about 
multiple erasures; we may not need to reconstruct the erased node exactly, but simply want to maintain the 
MDS property (this is called functional repair we may want to leave the MSR parameter regime; and so 
on. There is a growing body of work addressing these and other trade-offs, and the lay of the land is still 
not fully understood. The reader is referred to the survey JDRWSlll/ and the very helpful Erasure Coding for 
Distributed Storage Wiki \UTf for more details about these and other variants. 

2.3 Repair Bandwidth of Reed-Solomon Codes 

Reed-Solomon codes are commonly used for storage, but as mentioned above the traditional strategy (which 
has 61og2(|R|) = fclog 2 (|E|)) is not a good idea for the exact repair problem. However, the traditional 
strategy is not the best one can do! In |SPDC14] . Shanmugan et al. develop a general framework for 
studying the repair bandwidth of scalar MDS codes —that is, codes whose symbols naturally come from 
some field F rather than being constructed specifically as vectors over B. As one of their examples, they 
show that for a few specific Reed-Solomon codes, one can do better than the naive scheme. 

More precisely, |SPDC1^ adapts techniques from interference alignment (which have been previously used 
to construct good regenerating codes) to the scalar MDS setting. For general MDS codes with k = n —2, they 
give a polynomial-time algorithm which will find the optimal linear systematic repair scheme returning a 
single symbol from the subfield B. Thw apply this to find optimal linear exact systematic repair schemes for 
a (5,3) and (6,4)-Reed-Solomon code^, and they find non-trivial systematic repair schemes for the (14,10) 
Reed-Solomon code used in a module for the Apache Hadoop Distributed File System which is currently 
deployed by Facebook. 

There have been works which use RS codes as a building block for codes for distributed storage and related 
problems [SAP+131IHPZV131IHZM121 IRSKROQbl ITPD131 ITB14bl ITBMal IHLKBTHI . These works modify 
RS codes by, for example, adding parity checks, taking subcodes, folding, concatenating with other codes, 
and so on, but to the best of our knowledge, only the work of |SPDC14] described above addresses the repair 
bandwidth of Reed-Solomon codes themselves. Before we describe the rest of the literature surrounding 
exact recovery, we note two differences between our approach and that of |SPDC14] . 

• First, in |SPDC14] . the proof applies only for fc = n — 2, while our approach works for all (n, k). On the 
other hand, their approach works for any MDS code, while ours is tailored for Reed-Solomon codes. 

'^These codes are referred to as minimum storage because the storage in each of the n nodes is as small as possible, given 
that any k nodes should be able to reconstruct the message. 

^An (n, fc)-RS code has block length n and dimension k. 
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• Second, |SPDC14] considers exact repair of systematic nodes only. That is, k of the n storage nodes 
hold the original message, and the rest are viewed as parity checks; only these k special nodes are 
required to be repairable. In contrast, our approach guarantees recovery of all n nodes. 

As pointed out in [SPDCM], understanding the repair bandwidth of Reed-Solomon codes is an important 
problem, even if RS codes are not the best codes available. Indeed, these codes are implemented in practice 
in distributed storage systems (the example from [SPDCI4] is the HDFS-RAID module, which we will return 
to in Section lOl) . and it may be easier to implement improved algorithms on existing systems rather than 
replace the system. Pinning down the repair bandwidth of Reed-Solomon codes was asked as an open 
question by Dimakis et al. in [DRWSIl] . 

2.4 Existing Results for the Exact Repair Problem for General MDS Codes 

In order to set expectations, we briefly survey the upper and lower bounds available for exact repair using 
MDS codes (not necessarily RS codes). There are two main parameter regimes, depending on the parameter 
t. This parameter (which is the number of sub-symbols per symbol, or the degree of F over B) controls the 
level of subpacketization in the regenerating code. The first parameter regime, more commonly studied for 
regenerating codes, is when t is (very) large compared to n — k. The second parameter regime, more natural 
for RS codes, is when t is small compared to n — k. Both settings have their advantages. When t is large, 
each symbol can be sub-divided further (we have more subpacketization), and as we will see this allows for 
better bandwidth guarantees. On the other hand, when t is small, the field extension F over B is smaller, 
and this is easier to work with in practice. 

In this work we consider both parameter regimes. Our main focus is constant-rate RS codes with A = F, 
and so t = log 2 (n) is small compared to n — k. However, our framework also works for RS codes with A Q F 
and with n — k very small, and we give examples of constructions when t is large compared to n — k as well. 


Regime 1: large t. When t is sufficiently large, it is known that the “correct” answer for the repair 
bandwidth is 

h — -, (':*:) 

dFl-k' ^ ^ 

The lower bound on 6 is a fundamental result of DGW~*~ 1?)1 IWDR07] , and actually holds for functional repair 
as well as exact repair^ For the upper bound, it is shown in [SRlObl IC,IM~*~ 13] that as t —>■ oo (much faster 
than n), the exact repair bandwidth can approach Q. However, for this result, t must scale exponentially in 
n. It is conjectured that this exponential scaling is necessary |TWB14] . but the best that is known is that t 
must be at least k^ in order for Q to hold; for very high-rate codes, with k = n — 0 ( 1 ), we do know that t > 
exp(-yri) is required [GTC14) . There are also several schemes acheiving (ji|) exactly for particular parameter 
settings and/or systematic repair only, and for large t [GH.ILllllSRlOallRSKROQ^lPDGl.lllGHLllllTWBlS] . 


Regime 2: small t. When t is small compared to n — fc, on the other hand, it is clear that (jWj) cannot be 
met. Indeed, 

b > k + t — 1 (**) 

is a trivial lower bound on the exact repair bandwidth for any MDS code0 Thus, if t < n — A:, we have 
^ ^ d-^i-k • regime, we must have the ratio b/t tend to infinity. However, it is still the case that a 

bound o^ b — 1 is much better than the naive bound oi b — tk. 

We are only aware of two works addressing the exact repair problem when t is small compared to n — k. 
The first is [WDQ9j . who give a scheme with bandwidth {k — l)t + 1. Since the naive scheme has bandwidth 

^In the functional repair problem, the replacement node needn’t be a copy of the lost node; rather it just must maintain the 
MDS property. For some applications this is enough. However, for several reasons it is also useful to study the exact repair 
problem [SR10a| . Further, for us, only the exact repair problem makes sense given that we want to study a fixed code RS{A, k). 

^To see this, imagine that we download only a single sub-symbol from each of Ai — 1 nodes and are given the remaining 
symbols for free. Because the code is MDS, the final symbol could be anything; thus we need to read at at least one more 
symbol’s worth of information—or at least t more sub-symbols—to determine it. 
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kt, this is a slight improvement. The second work is |RSKR09a] . There the authors give optimal schemes, 
meeting (ITTi) for t > k — 2. However, for t < fc these results hold only for systematic nodes. They also show 
that, when only one sub-symbol is downloaded from every node and the reconstruction algorithm is linear, 
([★★I) cannot be met for t < fc — 3. 

We give a more detailed summary of known results for the exact repair problem for MDS codes, and 
compare them to our results for Reed-Solomon codes, in Table 0] in Appendix [Cj We outline our results in 
more detail in the next section. 


3 Results Overview 

Our main result is pinning down the best exact repair bandwidth of Reed-Solomon codes with A = F that 
can be acheived with linear schemes (Definition [5]) . This will follow for a characterization of linear exact 
repair schemes for RS codes, which is given in Section 0] 

To formulate our characterization, we first show in Theorem[5]that any linear exact repair scheme proceeds 
roughly as follows. First, we write the erased data f[a*) as a linear combination of the available data /(a): 

C/(a*) = X! Ma.c(«*)/{«)> 

and we may do this for several different C G F. Next, we take the trace of both sides: 

ti'F/B(C/(a*)) = X! ti'Ws(Ma.c(«*)/(«))- 

(y.£A\oi* 

If, for each a, the node corresponding to a delivers trp-/ 5 (/r„,c(a*)/(a)), we can recover If 

we do this for enough different ^’s we can recover f{a*). Thus, our goal is to find ^’s and C’s so that there 
are many collisions between the multipliers that a given node a is responsible for. As stated, this 

appears to be a daunting task. However, we show in Theorem 0] that this task is equivalent to the problem 
of finding some nice polynomials over F, and this will give us our characterization. 

In Section 01 we use this characterization with trace polynomials to obtain an exact repair scheme for 
high-rate Reed-Solomon codes which use the whole field as evaluation points. More precisely, we prove the 
following theorem. 

Theorem 1. Let B < F be any subfield of F, and let k = (1 — 1/|F|)|F|. Then the Reed-Solomon code 
RS{F,k) of rate 1 — 1/|F| which uses the entire field F as evaluation points admits a linear exact repair 
scheme over B with repair bandwidth n — 1. 

As per Remark!^ it is instructive also to write this in terms of bits. Returning a symbol of B is equivalent 
to returning log 2 (F) bits, and we have the following corollary. 

Corollary 2. Suppose that F has characteristic 2. Let B < F be a subfield and let e = |F|“^. Then there 
is a linear exact repair scheme for RS{F, (1 — e)|F|) over GF{2) with repair bandwidth (n — 1) log 2 (l/£). 

This scheme is nearly optimal for MDS codes with any repair scheme; the lower bound is k -i- t — 1 
sub-symbols, and Theorem [T] uses fc/(l — e) — 1 subsymbols. However, our second contribution is to prove an 
even stronger lower bound for linear repair schemes. More precisely, in Section 0J we show that Corollary 0] 
is optimal, even up to the leading constants, for linear schemes. 

Theorem 3. Let C be an MDS code of dimension k with evaluation points A over a field F. Let B < F be 
a subfield. Any linear repair scheme for C over B must have bandwidth (measured in subsymbols of B) at 
least 

n — 


b > (n- l)log|B| 


n — k 





In particular, the bandwidth (measured in bits) for any linear repair scheme for an MDS code with rate 1 — e 
over any base field B is at least 


^logad^l) > (n- l)log 2 0 (1 - i/n)^ . 

We also give a few other examples of how to use our characterization for RS codes with A ^ F. In 
Section 15.21 we give an example of an RS code with non-trivial bandwidth when \F\ might be arbitrarily 
larger than n. In Section 15.31 we use our characterization, along with a computer search, to find a scheme 
which improves the result of |SPDCI4] for the code used in the HDFS-RAID module [HFSj : this module is 
currently deployed in the Facebook Hadoop Analytics cluster [SAP~*~13] . 

3.1 Subsequent work 

After a preliminary version of this paper was released, it was shown in [YB16| that in fact our framework 
can be used to obtain good repair schemes for RS codes in the “large t” regime, meeting the cut-set bounds 
More precisely, for any field B, and for any n, k, they show how to choose n evaluation points A C Fi^p, for 

t = {n — fc)”, so that RS{A, k) has repair bandwidth b = t • Thus, the final take-away should be that, 

in all parameter regimes, Reed-Solomon codes are competetive (in terms of bandwidth) with regenerating 
codes in both parameter regimes! 

4 Characterization of linear repair schemes for MDS codes 

In this section, we give a characterization of linear exact repair schemes for MDS codes. The following 
theorem says that a linear exact repair scheme for a fc-dimensional MDS code is equivalent to being able 
to find, for each a* S A, a set V{a*) of dual codewords p € so that {p{a) : p G V{a*)} 
dimensional subspace for a ^ a*, and spans a high-dimensional subspace for a = a*. 

Theorem 4. Let B < F be a subfield so that the degree of F over B is t, and let A C F 
evaluation points. Let C Q F be an MDS code of dimension k, with evaluation points A. The 
equivalent. 

(1) There is a linear repair scheme for C over B with bandwidth b. 

(2) For each a* G A, there is a set 'P{a*) C C-^ of size t, so that 

dims {{p{a*) : p G Via*)}) = t, 
and the sets {p{a) : p G V{a*)} for a ^ a* satisfy 

b > max dims ({p(a) : p G V(a*)}). 

a*eA ^' 

aGA\{a*} 

To prove Theorem |4l we begin by showing that any linear repair scheme for MDS codes may as well have 
a particularly nice form. More precisely, we will show that it may as well have the form of Algorithm [TJ By 
inspection, it is clear that Algorithm [I] is indeed a linear repair scheme for C, for any choice of basis Z for 
F/B and for any coefficients so that (14.11) holds. We record this fact in the following proposition. 

Proposition 5. Algorithm\T\is a linear repair scheme for C over B with bandwidth 

b = max W dimB({/r^_„(a*) : C £ Z}). 

a*GA ’ 

aGA\{a*} 


spans a low- 

fee any set of 
following are 
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Algorithm 1: Framework of generic linear repair scheme for an MDS code C C F"" over a subfield i?, 
so that the degree of F over B is t. 

Input: A set A, a failed node a* G A, and access to linear queries of the form tTp/Bil ■ f{o)) for 
a G A \ {a*}, for some f G C. 

Output: The value f{a*) 

1 Choose a set Z C F oi size t, which has full rank over B. 

2 Choose coefficients for a G A \ {a*} and ( £ Z so that 

C/(a*) = ^C,«(«*)/(»)■ (4-1) 

aGA\{Q* } 


3 

4 


for £ Z do 

Let Qa{ct*) C F he any spanning set for : C, £ Z} over B, and query 


trF/B(7 •/(a)) for 7 G (5c<(o*)- 

5 Using the B-linearity of tr^/s, compute trp/f (pc)) for each a G A \ a*. 

6 Construct trF/B(C • /(o*)) from the identity 

i^F/siC- f{a*)) = X! trF/B(MC.a(a*)/(a), 

aeA\{Q(* } 

which follows from taking the trace of both sides of 631. 

7 end 

8 Compute f{a*) from the data {trF/_B(C • /(o*)) : C £ Z}. More precisely, since Z = {:^i,..., are a 

basis for F over B, let V = , fj} be the dual basis. Then 


t 

2=1 
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Moreoever, any linear repair scheme can be written in the form of Algorithm [TJ Proposition [5] and the 
ensuing Corollary [7] make this precise. 

Proposition 6. Suppose there is a linear repair scheme for an MDS code C Q F over B < F, given by 
query sets Qaio:*) and a linear repair algorithm, as in Definitions^ Then there is a basis Z for F over B so 
that the following holds. For each a* £ A and a G A \ a*, and f G Z, there are coeff dents 

C/(a*) = X! Aia.c(«*)/(«) 

aGA\{Q* } 


for all f G C, and so that for all a ^ a* € A, 

C spang{Qa{a*)). 

Proof By assumption, the linear repair algorithm computes coefficients Xi G B so that 

t 

/(a*) = 

for some basis Ui of F over B. Since the are B-linear functions of the queries, they are of the form 

= X! F fa.j.i-trF/Bil-fia)) 

Ci^OL* -y^Qa^Ct*) 

for some coefficients /3a, 7,1 G B. Let Ci,..., Ct be the dual basis for z/i,..., z/j, so that tTg/g^fiVi) = \i-i. 
Then for any i <t, 

t^F/siCifia*)) = X^= tTp/B I Y /3a,7.i7/(a) I =: Y ti'j’/s («*)/(«)) > (4-2) 

Ol^Ol* \ 7 eQa(Q!*) j Ol^Ol* 

where (1421) is defining the coefficients Equation (14.2|) holds for all / G C; since C is a linear code, 

it holds also for the function 7 • /(X) for 7 G F. This implies that for all f G C, and for all 7 G F, we have 

t^F/s (7 • CJ(a*)) = t^F/s I 7 • Y ] > 

\ aeA\{a*} / 

which in turn implies that for all polynomials / G F[X] of degree less than k, 

Qfia*)= Y ■/(a)- 

Q(eA\{ct*} 

Thus, we have a linear equation of the form reqnired for each Q G Z. Finally, we observe that the coefficients 
Ta,Cii^*) where they are supposed to. We have 


Ta,Cii^ ) — ^ ^ fa,'y,i''y, 

and so 

^ = 1, ■ ■ • 7} C spans Qaia*), 

as desired. □ 

Corollary 7. Let B Q F be a subfield and let A Q F be any set of evaluation points. Let k < |A| be any 
integer. Let C Q F be an MDS code with evaluation points A and dimension k. The following are equivalent. 
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(1) There is a linear repair scheme for C over B < F with bandwidth b. 

(2) There is a linear repair scheme for C over B of the form of Algorithm\^ with bandwidth at most b. 

Proof. The fact that (2) implies (1) follows from Proposition [SJ To show that (1) implies (2), suppose that 
there is a linear repair scheme for RS{A, k) with query sets Qa{a*). Choose the basis Z and the coefficients 
fiaxict*) in Algorithm [T] as guaranteed by Proposition [SI By Proposition (SJ the bandwidth of Algorithm [H 
instantiated this way, is 

b= max dimB({/ia,^(a*) : ( G Z}). 

a*GA 

aGA\{a*} 

By Proposition [SI we have, for all a*, 

{Ma.c(«*) : C ^ Z} C span^ (Qaia *)), 


and so for all a*, 


The corollary follows. 


dimB({/r„,c(a*) : C S Z)) < \Qa{a*)\ . 


□ 


Corollary |3 says that coming up with an exact repair scheme for an MDS codes is equivalent to the 
problem of coming up with the basis Z and the coefficients fiQ^aicn*). It is not hard to see that finding such 
coefficients is equivalent to finding nice dual codewords. 


Observation 8 . Fix a set A Q F with |A| = n, a subfield B < F so that F has degree t over B, and an 
integer k < n. Fix a* £ A and numbers da < t for each a G A \ {a*}. Let C Q F be an MDS code with 
evaluation points in A. The following are equivalent. 


(1) There is a basis Z for F over B and coefficients 


: a G a \ {a*} , C G 

so that for all f G C, for all (f G Z, 

Cf{a*) = ( 4 - 3 ) 


and for all a G A \ {a*}, 

dim_B({/ia.c(Q!*) : C S Z}) = da. 

(2) There is a set V{a*) C C-^ of size t, so that 

dimB({p(a*) : p G = t 


and for all a G A \ {a*}, 

dimB({p(Q;) : p G V^a*)}) = da. 

Proof. This follows from the definition of duality. For the (1) => (2) implication, given Z and fXaXy define 
: F —>• F by p^a*) = —f and P({a) = and let V{a*) = {p^ : ( G Z}. For the other direction, 

given V{a*) = {pi,... ,pt} Q C"*-, define Q = Pi{a*) and let Z = {Ci,..., ^t}. Then let fiaXii^^*) — Piio^)- CH 

Together, Observation [8] and Corollary |3 prove Theorem |4l 

Finally, we apply the reasoning above to Reed-Solomon codes in particular, using the fact that the dual 
of a Reed-Solomon code is again a generalized Reed-Solomon code. A generalized Reed-Solomon (GRS) code 
is the same as a Reed-Solomon code, except that there is an additional vector of multipliers that specify it: 
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Definition 6. The generalized Reed-Solomon (GRS) code of dimension k with evaluation points A = 
{ao,... ,an-i} C F and multiplier vector A G (F*)" is given by 

GRS{A,k,X) = {(A,/(a,))?=V ^ / e deg(/) < k} . 

It is well-known (see, for example, |MS77[ Thm. 4 in Ch.lO, § 8 ]) that the dual of a Reed-Solomon code 
RS{A, k) is a generalized Reed-Solomon (GRS) code GRS {A, k, A), where the multiplier Ai is given by 


Xi — ^^(Oi Oij) 
j¥=i 


Since these multipliers do not affect the dimension of the set -|^p(a)) 
Theorem 01 we may leave them out, and we have the following corollary 


: p G ) j statement of 

for Reed-Solomon codes. 


Corollary 9. Let B < F be a subfield so that the degree of F over B is t, and let A G F be any set of 
evaluation points. The following are equivalent. 


(1) There is a linear repair scheme for RS{A,k) over B with bandwidth b. 

(2) For each a* G A, there is a set V(a*) C F[X] be a set of t polynomials of degree less than n — k, so 
that 

dims ({p(a*) : p G Via*)}) = t, 
and the sets {p(a) : p G V(a*)} for a a* satisfy 


b > max 


E 

aGA\{a*} 


dims {{p{a) : p G Via*)}). 


Moreover, suppose that (2) holds. Then the linear repair scheme in (1) is given by Algorithm\^ with 
coefficients 

, . . n/3eA\{a*}(Q^* “/3) 

fia,c(a ) =P[a) 


and the basis Z given by 


n/3eA\{c<}(Q! - /3) 
Z = {p{a*) -.p&Via*)}. 


Thus, the task of finding repair schemes for RS codes boils down to choosing some particularly nice 
polynomials V. In the next section, we show several examples of how to pick these polynomials. In particular, 
for A = F we choose V to be trace polynomials, and obtain an optimal linear repair scheme for RS{F,k) 
for any k. 


5 Constructions for RS codes 

By Corollaryl^l a linear repair scheme for Reed-Solomon codes can be specified by choosing evaluation points 
A, and, for each a* G A, a set 'P{a*) of polynomials. In this section, we will make these choices in a few 
different parameter regimes. First in Section 15.11 we will choose A = F, and prove Theorem [1] giving an 
optimal linear repair scheme for general high-rate RS codes. Next, in Section 021 "we will give an example of 
a code where n <C jF'l is much smaller. Finally in Section [5.31 we will consider a concrete example, and give 
an improved repair scheme for the specific RS code used in the HDFS-RAID module, deployed by Facebook 
and studied in |SAP+13[[SPDCT4] . 
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5.1 When A = F: Optimal Linear Repair Schemes 

In this section, we will choose A = F to be the entire field and prove Theorem[T] For the reader’s convenience, 
we restate the theorem below. 


Theorem (Theorem [T]). Let B < F be any subfield of F, and choose k < n(l — l/|i3|). Then there is a 
linear exact repair scheme for RS{F, k) with repair bandwidth n — 1 over B. 

Proof of Theorem\T\ We will choose a set 'P{a*) of polynomials of degree less than n/\B\ for each a* G F, 
so that the conditions of Corollary IH] hold. Fix any basis Z C F for F over B, and choose 


Via*) 


trp/B (C(X-a*)) 
X-a* 



Notice that these indeed have degree \B\* ^ — 1 = |F|/|i?| — 1 < n/\B\. Then, for all a a*, we have 


{p{a) : p e P{a*)} C 


/3 


: PgB 


and in particular this has dimension 1 over B. On the other hand. 


p{a*) 


trp/B(C(X-a*)) 

X-a* 


X=a* 


C- 


Thus 

{p{a*) : pGV{a*)} = Z, 

which is by definition full rank. Thus, the conditions of Corollary [S] are satisfied, and the bandwidth of the 
resulting scheme is 

b = ^ dims {p{a) : p G T’{a*)} = ^ 1 = n — 1. 

Oi^OL* 

This proves Theorem [T] □ 


5.2 Large Field Sizes: Example Construction 

Theorem [3] implies that our construction in Theorem [1] is optimal for Reed-Solomon codes. However, as 
we discussed in Section O the assumption that A = F restricts t < log 2 (n). This is beneficial in some 
respects: if the extension field F over B has smaller degree, it is easier to implement in practice, especially 
when compared to constructions with t = 2". However, when t is large compared to n — fc, this moves us 
to the regime where Q is the binding lower bound, rather than (j^j). In this large-t regime, it’s possible 
that the ratio of the bandwidth b (the number of subsymbols downloaded) to t (the number of subsymbols 
to be recovered) could tend to a constant. On the other hand, in the small-t regime where (iTTl) is the 
binding constraint, this ratio must tend to infinity. Thus, allowing t to get large (increasing the level of 
sub-packetization) can improve the bandwidth in this sense. We remark that while the trade-off between Q 
and (FtTl occurs at t = n — fc, it is conjectured in |TWB14] that in fact t must be exponentially large in k in 
order for Q to be attainable, and it’s known that t must be at least k^ [GTC14] . 

With this trade-off in mind, we show how to use our framework to construct non-trivial linear repair 
schemes for RS codes with A C F much smaller than the entire field. This section is meant as a proof-of- 
concept; while our results are non-trivial, they are far from the bound Q, and indeed the aformentioned 
result of |GTC14) implies that with the particular parameters of our construction below, we cannot hope to 
attain that. 

In a preliminary version of this work, we left it as an open question to take advantage of large t with our 
approach. In subsequent work [YBI6) Ye and Barg were able to do this, and show how to meet (Q when t 
is sufficiently large. In that construction, t = (n — fc)". It is still an interesting open question how small one 
can take t in order to come up with RS codes meeting Q. 
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Theorem 10. Suppose that F = GF(2^) for even s. Choose any even n < 2[\J\F\ — 1). There is a set 
of n evaluation points A so that for any k < n — 2, RS{A,k) admits a linear exact repair scheme over 
B = GF(2"/2) so that the bandwidth in bits is at most 

&log2(|B|) < 

In particular, choosing k = n — 2, we have a extremely high-rate code with bandwidth 

blog^QBl) < +o(l)^ sk. 


Notice that the naive scheme has bandwidth sk bits, and © gives a lower bound of sk/2 bits. 


Proof. Let 7 be a primitive element of F. Choose A to consist of n/2 points from B* and n/2 points from 
'jB*. Choose the polynomials 


V{a*) 


{1,X} a*e-fB* 
{1,7-'X} a*GB* 


It is easy to check that in either case, the set {p{a*) : p G 'P{a*)} has full rank, and that for all a a*, we 
have dimB({p(a) : p G P(a*)}) = 1 whenever a ^ a*B*. Finally, the polynomials in V{a*) are linear, and 
so by Corollary ini as long as k < n — 2, this gives a linear exact repair scheme with bandwidth (in bits) 




5.3 A Specific Example: a (14,10)-GRS Code. 

In this section, we give a linear exact repair scheme for the generalized Reed-Solomon code (see Definition |6]) 
currently deployed in the Facebook Hadoop Analytics cluster. Notice that for the exact repair problem, an 
exact repair scheme for RS{A, k) gives an exact repair scheme for GRS{A, k, A) for any A. Indeed, the Fth 
node holds the symbol \if(ai), and knows A^, so it also can compute /(oi)- 

The HDFS-RAID [HFSj module is an open-source module which implements coding for distributed 
storage in the Apache Hadoop Distributed File System (HDFS). This module allows for the use of Reed- 
Solomon code, and it impl ements a particular (I4,10)-GRS code; this is currently deployed in the Facebook 
Hadoop Analytics cluster |SAP~*~13] . This GRS code was used as a benchmark in [SAP+IS] for comparison 
with novel regenerating storage schemes, and [SPDGI4] improves on the naive repair scheme for this GRS 
code (the naive scheme is the one implemented in the module). The latter work gives a non-trivial exact 
repair scheme for this particular code which can recover the systematic nodes (that is, the 10 out of the 14 
nodes interpreted as holding the original data). 

Using our characterization, it was quick to produce (via a computer search) a scheme that performs 
better than that of [SPDCI4] and also which can recover all of the nodes, not only the systematic ones. We 
give the details of the code, our search, and our results below. 

The HDFS-RAID module (see [HFS| . code at |FB| . and the explicit generating matrix given in |SPDG14] 1 
implements a GRS code over F = GF{2^) defined as follows. Let C be a primitive element of F (more 
precisely, C is a root of the primitive polynomial 1 -\- x"^ -\- x^ -\- x^ -\- x^). The code is given by 

C = {(co,ci,... ,ci3) : c(l) = c(C) = c(C^) = c(C^) = O} , 

where c(X) = X]i=o ^iXL It is easiest to describe C as above (using the dual formulation), but it is not hard 
to verify that C is indeed a GRS code: C = GRS {A, k, A) with evaluation points 

A = {I,c,c^•.•,c^n 
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and some vector A. As mentioned above, for the exact repair problem, only the evaluation points A matter, 
and any exact repair scheme for RS{A, k) will give an exact repair scheme for C = GRS{A, k, A). 

The work |SPDC14] gives an improved scheme for C. More precisely, they show how to recover each of the 
10 systematic nodes using bandwidth at most 65 bits; the naive bound is 80 bits, and the lower bound Q is 20 
bits. We give a scheme which uses at most 64 bits per node, and additionally can recover from any failure, not 
just the failure of a systematic node. We give a linear repair scheme over B = GF{2'^). Such a scheme is spec¬ 
ified by a choice of two degree-3 polynomials over GF{2^) for each a* G A. Our polynomials are given in Ap- 
pendixini and our code can be found at http; //sites. google. com/site/marywootters/exhaust_FB. sage. 
To find this scheme, we searched over all such polynomials which had three distinct roots in A; the 
reason for this assumption was to speed up the search, with the intuition that a value p{a) = 0 for a G A 
automatically reduces the dimension of the set {p{a) : a G A}. This was a reasonably quick search and it 
produced a good solution to the exact repair problem for this particular code. However, both the assumption 
about the roots and the large size of B potentially limit the performance of this code; these assumptions 
were made so that naive search would be fast. It is an interesting and important question if given evaluation 
points A and a base field B, one can (sometimes) efficiently find a (near-)optimal linear exact recovery 
scheme for RS{A, k) over B. 

6 Lower bounds 

In this section, we prove Theorem[3l which gives a lower bound exactly matching Theorem[T] For the reader’s 
convenience, we restate Theorem [3] here. 

Theorem (Theorem [3]). Let C be an MBS code of dimension k with evaluation points A over a field F. Let 
B < F be a subfield. Any linear repair scheme for C over B must have bandwidth (measured in subsymbols 
of B) at least 

!>>("-i)iog|Bi (((r^). 

In particular, the bandwidth (measured in bits) for any linear repair scheme over any base field B is at least 

blog2i\B\) > (n- l)log2 ■ 

Proof of Theorem\^ Fix any a* G A, and consider any linear exact repair scheme which repairs the node 
corresponding to a* using b sub-symbols from B. By Theorem SI there is some set V C of size t so that 

{pia*):pGV} (6.1) 

has full rank over B, and so that 

diuiBiipia) : p GV}) = da (6.2) 

where 

b= 

For any vector x G B^ (indexed by the functions p G V) lei Px '■ F ^ F he 

p,(X) = ^a:p-p(X). 
pev 


Let 

Sa = {x G B* : Pxia) = 0} . 
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By (16.2p . Sa is a vector space over B of dimension t — da. Thus, on average over all nonzero x S B*, we have 



^ |{aeA\{a*} 


X G Sa} I 



aGA\{a*} 



=: r. 


In particular, there exists some x* G B* so that | {a : x* G Sa} \ > r. Consider 

p*(X) = E^>(X). 

pev 

By the choice of x*, p* vanishes on at least r points of A \ {a*}. Notice also that p* is nonzero. Indeed, if 
it were zero, then 

E 

pev 

contradicting (pH) . However, p* G C-*-, since is a linear code. Since C is an MDS code, so is and in 
particular, the distance is n — k. Since p* is nonzero, this implies that r < n — k, which implies that 


aGA\{a* } 

Thus, we have the bound on bandwidth 


h> 


min 

da,elo,t] 


E 


S.t. 


QGA\{a*} 


The minimum occurs when are da are balanced, and equal to log|^| ( j, 


and we have 


b> {n 


l)l0g|B| 




This completes the proof. □ 

Remark 6 (A simple proof of the cut-set bound for exact repair of MDS codes). The bound IQ, which holds 
for general codes and for functional repair, has a very simple proof for linear exact repair of MDS codes. In 
the language of the proof of Theorem [21 the number of roots of Vx (X) must be less than n — k for all nonzero 
X G B*. This implies that for all sets T C A \ {a*} of size n — k, we must have there 

would be some x so that Vx vanishes on T. Thus, averaging over all sets T, we obtain 


b = 


E 

aGA\{a*} 


da > t 


n — 1 
n — k 


which is precisely (0. 


7 Discussion 

Inspired by the exact repair problem for Reed-Solomon codes, we studied a variant of the classical polynomial 
interpolation problem. How many bits are needed from {/(a) : a ^ a*} in order to recover /(a*)? We have 
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shown that this can be much smaller than the number of bits needed using standard polynomial interpolation. 
We gave a characterization of the number of bits needed, in terms of another problem about polynomials. 

Our results imply that, while Reed-Solomon codes are often given as an example of how not to solve the 
exact repair problem, in fact, with the right reconstruction algorithm, they can be optimal! This is heartening 
news, since RS codes are commonly used in practice. More precisely, we give an optimal linear exact repair 
scheme when block length is n = |F|; to the best of our knowledge, for this level of sub-packetization, Reed- 
Solomon codes significantly out-perform all known schemes. Additionally, we give a few examples of how to 
use this characterization in order to come up with non-trivial repair schemes for other codes. In particular, 
we give an example of a family where the set of evaluation points A C A is much smaller than the entire 
field. We also give an improved exact repair scheme for a particular (14,10)-GRS code used by Facebook. 

Finally, subsequent work using our framework has shown that RS codes can also perform well in the 
“large-t”-regime, and in fact can approach the cut-set bound [YB16] . 

We conclude with a few open questions. 

1. Given a specific RS code (that is, given a set A C F), is there an efficient algorithm which will give a 
(near)-optimal linear repair scheme? Our example with the (14,10)-GRS code was small enough that 
we could do an exhaustive search, under some additional assumptions, but even for this code we still 
do not know the best linear repair scheme. 

2. Our scheme when A = F is very efficient in terms of bandwidth, but is not very efficient in terms of the 
total number of bits accessed. An inspection of the scheme reveals that most nodes will have to touch 
n(t) bits before deciding which 0(1) bits to return. In practice, this is also an important concern. 
How well Reed-Solomon codes can do when this is taken into account. 

3. Our characterization and examples are for linear repair schemes. How much better can one do with 
non-linear repair schemes? 


Acknowledgements 

We thank Alex Dimakis for bringing this problem to our attention and for patiently answering our questions 
about the problem set-up, and Ankit Rawat for helpful feedback. We also thank the Simons Institute at 
Berkeley for their hospitality in Spring 2015, where some of this research was done. 


References 


[CHJLll] Viveck R Cadambe, Gheng Huang, Syed A Jafar, and Jin Li. Optimal repair of mds codes in 
distributed storage via subspace interference alignment. arXiv preprint arXw:1106.1250, 2011. 

[GHLll] Viveck R Gadambe, Cheng Huang, and Jin Li. Permutation code: Optimal exact-repair of 
a single failed node in mds code based distributed storage systems. In Information Theory 
Proceedings (ISIT), 2011 IEEE International Symposium on, pages 1225-1229. IEEE, 2011. 

[CJM+13] Viveck R Cadambe, Syed Jafar, Hamed Maleki, Kannan Ramchandran, Changho Suh, et al. 

Asymptotic interference alignment for optimal repair of mds codes in distributed storage. In¬ 
formation Theory, IEEE Transactions on, 59(5):2974-2987, 2013. 

[DGW+10] Alexandros G Dimakis, P Godfrey, Yunnan Wu, Martin J Wainwright, and Kannan Ramchan¬ 
dran. Network coding for distributed storage systems. Information Theory, IEEE Transactions 
on, 56(9):4539-4551, 2010. 

[DRWSll] Alexandros G Dimakis, Kannan Ramchandran, Yunnan Wu, and Changho Suh. A survey on 
network codes for distributed storage. Proceedings of the IEEE, 99(3):476-489, 2011. 


18 




[FB] 

https://github.com/facebookarchive/hadoop-20/tree/master/src/contrib/raid/src/java/org/apache 
Accessed: July 2015. 

[GTC14] 

Sreechakra Goparaju, Itzhak Tamo, and Robert Calderbank. An improved sub-packetization 
bound for minimum storage regenerating codes. Information Theory, IEEE Transactions on, 

60(5):2770-2779, 2014. 

[HFS] 

http://wiki.apache.org/hadoop/HDFS-RAID, Accessed: July 2015. 

[HLKB15] 

Wentao Huang, Michael Langberg, Joerg Kliewer, and Jehoshua Bruck. Communication efficient 
secret sharing. arXiv preprint arXiv:1505.07515, 2015. 

[HPZV13] 

Yunghsiang S Han, Hung-Ta Pai, Rong Zheng, and Pramod K Varshney. Update-efficient 
regenerating codes with minimum per-node storage. In Information Theory Proceedings (ISIT), 

2013 IEEE International Symposium on, pages 1436-1440. IEEE, 2013. 

[HZM12] 

Yunghsiang S Han, Rong Zheng, and Wai Ho Mow. Exact regenerating codes for byzantine fault 
tolerance in distributed storage. In INFOCOM Proceedings, pages 2498-2506. IEEE, 2012. 

[MS77] 

Florence Jessie MacWilliams and Neil James Alexander Sloane. The theory of error correcting 
codes. Elsevier, 1977. 

[PDC13] 

Dimitris S Papailiopoulos, Alexandros G Dimakis, and Viveck R Cadambe. Repair optimal era¬ 
sure codes through hadamard designs. Information Theory, IEEE Transactions on, 59(5):3021- 
3037, 2013. 

[RSKll] 

Korlakai Vinayak Rashmi, Nihar B Shah, and P Vijay Kumar. Optimal exact-regenerating 
codes for distributed storage at the msr and mbr points via a product-matrix construction. 

Information Theory, IEEE Transactions on, 57(8):5227-5239, 2011. 


[RSKR09a] K.R. Rashmi, Nihar B Shah, P Vijay Kumar, and Kannan Ramchandran. Explicit codes min¬ 
imizing repair bandwidth for distributed storage. In Allerton Conference on Communication, 
Control, and Computing, pages 1243-1249. IEEE, 2009. 

[RSKR09b] KV Rashmi, Nihar B Shah, P Vijay Kumar, and Kannan Ramchandran. Explicit construc¬ 
tion of optimal exact regenerating codes for distributed storage. In Allerton Conference on 


[SAP+13] 

Communication, Control, and Computing, pages 1243-1249. IEEE, 2009. 

Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G Di¬ 
makis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. Xoring elephants: Novel erasure 
codes for big data. In Proceedings of the VLDB Endowment, volume 6, pages 325-336. VLDB 

Endowment, 2013. 

[SPDC14] 

Karthikeyan Shanmugam, Dimitris S Papailiopoulos, Alexandros G Dimakis, and Giuseppe 

Caire. A repair framework for scalar mds codes. Selected Areas in Communications, IEEE 

Journal on, 32(5):998-1007, 2014. 

[SRlOa] 

Changho Suh and Kannan Ramchandran. Exact-repair MDS codes for distributed storage using 
interference alignment. In Information Theory Proceedings (ISIT), 2010 IEEE International 

Symposium on, pages 161-165. IEEE, 2010. 

[SRlOb] 

Changho Suh and Kannan Ramchandran. On the existence of optimal exact-repair mds codes 
for distributed storage. arXiv preprint arXiv:1004.4663, 2010. 

[TB14a] 

Itzhak Tamo and Alexander Barg. Bounds on locally recoverable codes with multiple recovering 
sets. In Information Theory (ISIT), 2014 IEEE International Symposium on, pages 691-695. 

IEEE, 2014. 


19 


[TB14b] Itzhak Tamo and Alexander Barg. A family of optimal locally recoverable codes. Information 
Theory, IEEE Transactions on, 60(8):4661-4676, 2014. 

[TPD13] Itzhak Tamo, Dimitris S Papailiopoulos, and Alexandros G Dimakis. Optimal locally repairable 
codes and connections to matroid theory. In Information Theory Proceedings (ISIT), 2013 IEEE 
International Symposium on, pages 1814-1818. IEEE, 2013. 

[TWB13] Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck. Zigzag codes: Mds array codes with optimal 
rebuilding. Information Theory, IEEE Transactions on, 59(3):1597-1616, 2013. 

[TWB14] I. Tamo, Z. Wang, and J. Bruck. Access versus bandwidth in codes for storage. Information 
Theory, IEEE Transactions on, 60(4):2028-2037, 2014. 

[UT] University of Texas ECE Dept. Erasure Coding for Distributed Storage wiki. 

http://storagewiki.ece.utexas.edu/. Accessed: July 2015. 

[WD09] Yunnan Wu and Alexandros G Dimakis. Reducing repair traffic for erasure coding-based storage 
via interference alignment. In Information Theory Proceedings (ISIT), 2009 IEEE International 
Symposium on, pages 2276-2280. IEEE, 2009. 

[WDR07] Yunnan Wu, Alexandros G Dimakis, and Kannan Ramchandran. Deterministic regenerating 
codes for distributed storage. In Allerton Conference on Control, Computing, and Communica¬ 
tion, 2007. 

[YB16] Min Ye and Alexander Barg. Explicit constructions of mds array codes and rs codes with 
optimal repair bandwidth. In Information Theory Proceedings (ISIT), 2016 IEEE International 
Symposium on. IEEE, 2016. To appear. 


A Table of notation 

For reference. Table [5] gives a summary of our notation, and also a translation to common conventions from 
the regenerating codes literature. Broadly speaking, we reserve some Greek letters {a, (I,...) for elements of 
the finite field F and a few (e, J) for small real numbers; we reserve capital Roman letters {S,T, A,...) for 
subsets of the finite held F; and we reserve some lower-case Roman letters {b,t,d,...) for integers and some 
(/, p) for polynomials mapping F ^ F. We use X as a variable in polynomials. 


B Direction of communication 

It may seem strange that the nodes are allowed to return any (linear) function of the data stored in them. 
After all, the goal is to minimize communication, and this scheme requires that we ask each remaining node for 
a specihc (set of) functions. This could potentially result in a lot of communication from the new replacement 
node to the existing nodes; meanwhile, the dehnition of repair bandwidth only captures communication from 
the existing nodes to the replacement node. However, while minimizing communication in both directions is 
obviously of interest, the regenerating codes literature has focused mostly on the communication from the 
existing nodes to the newcomer. The main justihcation is as follows. 

First, notice that we only need to communicate the identity of the failed node, a*, to each remaining 
node. If t is the dimension of F over B, this requires t sub-symbols to be sent to each remaining node. Thus, 
if either (a) not too many nodes need to be contacted or (b) it is easy to broadcast the information a* to 
many nodes, this is not an issue. 

Moreover, even if the number of nodes contacted is large compared to the dimension t of F over B, this 
is not a problem in practice. Because each server is very large, and these codes are typically implemented 
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Description in terms of the exact 
repair problem 

Our name 

Our notation 

Common notation in 
regenerating codes 

Contents of a node 

Symbol 

An element of F 

- 

Response from a node 

Sub-symbol (or 
several 
sub-symbols) 

Element (s) oi B < F 

Smallest unit of 
subpacketization; often 

Fg or Fp 

Number of blocks in the original 
file 

Message length 

k 

k 

Number of nodes 

Codeword 

length 

n 

n 

Number of sub-symbols 
downloaded from each node 


We allow this to vary 
from node to node in 
our definition 

/3 

Number of sub-symbols 
contained in each node 

Symbol size 

t 

a 

Number of sub-symbols in the 
original file 


kt 

M 

Number of sub-symbols 
contained in each message block 

— 

t 

Mjk 

Number of sub-symbols 
downloaded to repair a single 
erased node 

(Exact) repair 
bandwidth 

b 

7 

Number of nodes accessed to 
repair a single erased node 

(Exact) repair 
locality 

d 

d 


Figure 2: A description of our notation and a translation to the standard notation for regenerating codes. 
We consider minimum storage regenerating (MSR) codes, so the number of sub-symbols stored in a single 
node is the same as the number of subsymbols stored in a block of the file. 


over smaller fields, the following set-up—called “data striping”, or an “interleaved RS code”—is common^ 
We still have n servers, each associated with an element a G A; however, instead of holding a single element 
of F, these servers hold m elements of F. We encode a file consisting of mklog 2 {\F\) bits as follows. First, 
we break up the file into m messages in F^, interpreted as m polynomials ■ • ■, of degree at 

most k — 1. Then we encode each of these messages with the Reed-Solomon code, obtaining m codewords 
(/«(«)) eg A- Finally, we distribute these codewords among the n servers: the server corresponding to a 
holds for i = 1,..., m. This setup is depicted in Figure[3] 

Now, suppose the server corresponding to a* is erased. We wish to set up a replacement server, and we 
run our exact repair scheme for each codeword i = 1,... ,m. Notice that the replacement server needs 
to regenerate at least m elements of F^ so the number of sub-symbols it must download from the remaining 
servers is at least mt. This swamps the amount of communication going in the other direction: in this set-up, 
the replacement server needs to communicate a* to each of the n servers, which is nt mt sub-symbols. 

Thus, even in the case where t is small compared to n, it is interesting to consider the one-way commu¬ 
nication captured in the definition of the repair bandwidth. 

®See IRSKR09al or IRSKllI for a further description of this; we also thank Alex Dimakis for pointing out to us that this is 
how Reed-Solomon codes are used by Facebook in m- 


21 























Figure 3: How we might implement an RS code for storage in practice, when the size of a server is large 
compared to the size of an element of F. Suppose we have an exact repair scheme for the RS codes. When a 
server corresponding to a* fails, the replacement server communicates “a*” to each surviving server. Then 
each surviving server sends the replacement server what the exact repair scheme dictates for each of the m 
RS codewords. In this set-up, the amount of communication from the replacement server to the existing 
servers (which is at most nt) is negligible compared to the amount of communication (which is mb) from the 
existing nodes to the replacement node, since m is much larger than all other parameters. 
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C Extended literature review 


In Table |4l we give an extended literature review summarizing work on MDS codes for the exact repair 
problem, and including our results for RS codes in comparison. 


D Explicit construction for Facebook code 

In this section, we present the polynomials returned from our search, which give a linear exact recovery scheme 
for the (14,10)-GRS code used in the Facebook Hadoop Analytics cluster. The code which produced these 
polynomials can be found at http://sites.google.com/site/marywootters/exhaust_FB.sage, Table [S] 
gives an exact repair scheme for RS{A, 10) with A = {C* : 0 < * < 13} C GF{2^) over B = GF{2'^). As in 
Theorem 21 such a scheme is given by two cubic polynomials for each choice of a*. We have also listed the 
bandwidth 

6 (a*) •log2(|S|) =4- ^ dimB{pia) : pGVia*)} 

aeA\{a*} 

for each choice of a*. The total bandwidth for the scheme is given by the maximum of these which is 64 bits. 
As discussed in Sectio n lOl an e xact repair scheme for RS{A, 10) gives an exact repair scheme for the code 
GRS{A, 10, A) used in [SAP+ 13l ISPDC141IHFS) . The previous best result for this code, from [SPDC14] had 
bandwidth 65 bits and only recovered the 10 systematic nodes. 
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Relevant parameter regime: t < n — k 


Paper 

k, n 

No. nodes 
accessed d 

No. 

Subsymbols 
per symbol t 

Subsymbol size 

\B\ 

Repair bandwidth 
b 

Notes 

Naive lower 
bound 

- 

- 

- 

- 

b > k + t — 1 


IWD09I 

n > fc + 2 

d = 

k -\-t — 1 

t < n — k 

\B\ may be as 
large as 

b = {k — l)t + 1 


IRSKROQal 

n > fc + 2 

d = 

k + t — 1 

t>k-2 

\B\ > t n — k 

b = k 1 — 1 

For exact 
repair of 
systematic 
nodes only. 

Theorem [3] 
(holds for RS 
codes with 
any evalation 
points). 

k = {1 — £)n 




b > 

(n-l)log2(4)- 

0(1) 

Lower bound 
for linear 
repair schemes 
for RS codes. 

Theorem n 
(RS codes 
with A = F) 

k = 

{l-l/\B\)n 

d = n — 1 

t = log|s|(^) 

B < F subfield 

b = (n — 1) 


Corollary [2] 
(RS codes 
with A = F) 

k = (1 — £)n 

d = n — 1 

t = log2(n) 

\B\ = 2 

b = 

(n-l)log 2 (l/e) 

Provided 

log2(l/e)h 


Relevant parameter regime: t > n — k 


Paper 

k, n 

No. nodes 
accessed d 

No. 

Subsymbols 
per symbol t 

Subsymbol size 

\B\ 

Repair bandwidth 
b 

Notes 

(DGW+lOl 





^ > Ai-k 

Lower bound 
for functional 
repair 

ISRlOal 

n < 2k 

d = n — 
l>2k-l 

t = d—k+1 > k 

\B\=n-k 

h 

^— d+l-k 


IRSKllI 

n < 2/c + 1 

d>2k-2 

t = d—k + l> 
k - 1 

\B\=n-k 

b — 

^ d+l-fc 


ISRlObl 

IC.TM+LSl 



(d + 1 - fc) ■ 

\B\ depends on 

A 

&<d(A + l)^ 
(which is about 

Ai-k 

c = c{n, k) = 

(n — k)(k — 1) 

Theorem [To] 
(RS codes 
with a 
specific 
ACF) 

k < n — 2 

d = n — 1 

Any t 

\B\ = 2 

o- 

II 

1 

The naive 
result is kt and 
the lower 
bound 0 is 
kt/2, so this is 
non-trivial but 
not optimal. 


Figure 4: Constructions and bounds for exact repair of general MDS codes. Above we summarize 
results for the exact repair problem for MDS codes, separated by parameter regime. The yellow rows indicate 
lower bounds, and the white rows are upper bounds. We note that there are several results known in the 
large-t setting (t exponential in n) for exact recovery of systematic nodes only and for specific rates, like 
n = k + 2 [CHLlll ITWB131IPDC13) . We omit the quantitative details of these results in our table, and 
refer the reader to m for a more extensive literature review in these special cases. 
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a* 

Polynomials 

Bandwidth (in bits) for a* 

c° 

(X+ oix+e){x+e) (X+ e){x+e){x+o 

64 

C' 

(X+c2)(x+ e)ix+e) (X+c")(x+c®)(x+ c) 

64 


(X + C^)(X + C®)(X + c®) (X + C^)(X + C^3)(X + (12) 

60 

C' 

(X+c")(x+c®)(x+c®) (X+ e)ix+e^){x+e^) 

60 


(X + C")(X + C®)(X + c®) (X + C2)(X + e^)(x + (12) 

60 

C' 

(X + (1)(X + (3)(X + (9) (X + (3)(X + (4)(X + (11) 

64 

c® 

(X + (1)(X + (2)(X + (10) (X + (1)(X + (0)(X + (12) 

64 

C" 

(X + (1)(X + (2)(X + (8) (X + (1)(X + (0)(X + (12) 

64 

c® 

(X + (2)(X + (0)(X + (0) (X + (2)(X + (13)(X + (12) 

60 

c® 

(X + (1)(X + (2)(X + (5) (X + (3)(X + (8)(X + (0) 

64 

^10 

(X + (1)(X + (2)(X + (0) (X + (1)(X + (0)(X + (13) 

64 


(X + (2)(X + (0)(X + (0) (X + (2)(X + (13)(X + (12) 

60 

^12 

(X + (1)(X + (2)(X + (5) (X + (1)(X + (0)(X + (13) 

64 

^13 

(X + (1)(X + (2)(X + (3) (X + (3)(X + (8)(X + (0) 

64 


Figure 5: Polynomials which give an exact repair scheme for the (14,10)-GRS code used in [HFS] . 
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